Crowdsourcing WordNet
نویسندگان
چکیده
This paper describes an experiment in using Amazon Mechanical Turk to collaboratively create a sense inventory. In a bootstrapping process with massive collaborative input, substitutions for target words in context are elicited and clustered by sense; then more contexts are collected. Contexts that cannot be assigned to a current target word’s sense inventory re-enter the loop and get a supply of substitutions. This process provides a sense inventory with its granularity determined by common substitutions rather than by psychologically motivated concepts. Evaluation shows that the process is robust against noise from the crowd, yields a less fine-grained inventory than WordNet and provides a rich body of high precision substitution data at a low cost.
منابع مشابه
Building a WordNet for Sinhala
Sinhala is one of the official languages of Sri Lanka and is used by over 19 million people. It belongs to the Indo-Aryan branch of the Indo-European languages and its origins date back to at least 2000 years. It has developed into its current form over a long period of time with influences from a wide variety of languages including Tamil, Portuguese and English. As for any other language, a Wo...
متن کاملComputational and Crowdsourcing Methods for Extracting Ontological Structure from Folksonomy
This paper investigates the unification of folksonomies and ontologies in such a way that the resulting structures can better support exploration and search on the World Wide Web. First, an integrated computational method is employed to extract the ontological structures from folksonomies. It exploits the power of low support association rule mining supplemented by an upper ontology such as Wor...
متن کاملValidating and Extending Semantic Knowledge Bases using Video Games with a Purpose
Large-scale knowledge bases are important assets in NLP. Frequently, such resources are constructed through automatic mergers of complementary resources, such as WordNet and Wikipedia. However, manually validating these resources is prohibitively expensive, even when using methods such as crowdsourcing. We propose a cost-effective method of validating and extending knowledge bases using video g...
متن کاملA Methodology for Word Sense Disambiguation at 90% based on large-scale CrowdSourcing
Word Sense Disambiguation has been stuck for many years. In this paper we explore the use of large-scale crowdsourcing to cluster senses that are often confused by non-expert annotators. We show that we can increase performance at will: our in-domain experiment involving 45 highly polysemous nouns, verbs and adjective (9.8 senses on average), yields an average accuracy of 92.6 using a supervise...
متن کاملsloWCrowd: A crowdsourcing tool for lexicographic tasks
The paper presents sloWCrowd, a simple tool developed to facilitate crowdsourcing lexicographic tasks, such as error correction in automatically generated wordnets and semantic annotation of corpora. The tool is open-source, language-independent and can be adapted to a broad range of crowdsourcing tasks. Since volunteers who participate in our crowdsourcing tasks are not trained lexicographers,...
متن کامل